ML-As-2

Point Estimation

The Poisson distribution is a useful discrete distribution which can be used to model the number of occurrences of something per unit time. For example, in networking, packet arrival density is often modeled with the Poisson distribution. If $X$ is Poisson distributed, i.e., $X P o i s s o n (λ)$ , its probability mass function takes the following form:

P (X | λ) = \frac{λ^{X} e^{- λ}}{X!}

It can be shown that $E (X) = λ$ . Assume now we have $n$ i.i.d. data points from $P o i s s o n (λ) : D = X_{1}, \dots, X_{n}$ . (For the purpose of this problem, you can only use the knowledge about the Poisson and Gamma distributions provided in this problem.)

(a)

Show that the sample mean $\hat{λ} = \frac{1}{n} Σ_{i = 1}^{n} X_{i}$ is the maximum likelihood estimate (MLE) of $λ$ and it is unbiased ( $E \hat{λ} = λ$ ).

Finding the MLE

L (λ) = \prod_{i = 1}^{n} P (X_{i} | λ) = \prod_{i = 1}^{n} \frac{λ^{X_{i}} e^{- λ}}{X_{i}!}

\ln L (λ) = \sum_{i = 1}^{n} (X_{i} \ln λ - λ - \ln (X_{i}!))

\frac{d}{d λ} \ln L (λ) = \sum_{i = 1}^{n} (\frac{X_{i}}{λ} - 1) = 0

\sum_{i = 1}^{n} X_{i} = n λ

\hat{λ} = \frac{1}{n} \sum_{i = 1}^{n} X_{i}

Unbiasedness

E (\hat{λ}) = E (\frac{1}{n} \sum_{i = 1}^{n} X_{i})

Since $X_{i}$ are i.i.d., we can take the expectation inside the sum:

E (\hat{λ}) = \frac{1}{n} \sum_{i = 1}^{n} E (X_{i}) = \frac{1}{n} \sum_{i = 1}^{n} λ = \frac{n λ}{n} = λ

Therefore, $E (\hat{λ}) = λ$ , confirming that $\hat{λ}$ is an unbiased estimator of $λ$ . of $\hat{λ}$

(b)

Now let's be Bayesian and put a prior distribution over $λ$ . Assuming that $λ$ follows a Gamma distribution with the parameters $(α, β)$ , its probability density function:

p (λ | α, β) = \frac{β^{α}}{Γ (α)} λ^{α - 1} e^{- β λ}

Where $Γ (α) = (α - 1)!$ (here we assume $α$ is a positive integer). Compute the posterior distribution $λ$ .

P (θ | λ) = \frac{P (X | λ) P (λ | α, β)}{P (X)}

P (θ | λ) \propto P (X | λ) P (λ | α, β)

= \frac{λ^{X} e^{- λ}}{X!} \frac{β^{α}}{Γ (α)} λ^{α - 1} e^{- β λ}

P (θ | λ) \propto λ^{X + α - 1} e^{- λ (β + 1)}

Let $α^{'} = X + α$ , $β^{'} = β + 1$ Then the distribution is still a Gamma distribution

(c)

Derive an analytic expression for the maximum a posterior (MAP) of $λ$ under $G a m m a (α, β)$ prior.

M A P (λ) = \prod_{i = 1}^{n} P (X_{i} | λ) = \frac{\prod_{i = 1}^{n} P (X_{i} | λ) P (λ)}{P (X)} \propto \prod_{i = 1}^{n} P (X_{i} | λ) P (λ)

\prod_{i = 1}^{n} P (X_{i} | λ) P (λ) \propto l o g \prod_{i = 1}^{n} P (X_{i} | λ) P (λ)

l o g P (λ ∣ X) \propto l o g (\prod_{n}^{i = 1} P (X_{i} ​ ∣ λ) P (λ)) \propto \sum_{i = 1}^{n} ​ l o g P (X_{i} ​ ∣ λ) + l o g P (λ)

\sum_{i = 1}^{n} ​ l o g P (X_{i} ​ ∣ λ) + l o g P (λ)

Prior Distribution $P (λ)$

P (λ | α, β) = \frac{β^{α}}{Γ (α)} λ^{α - 1} e^{- β λ}

l o g P (λ | α, β) \propto (α - 1) l o g λ - β λ

Likelihood function $P (X_{i} | λ)$

P (X | λ) = \frac{λ^{X} e^{- λ}}{X!}

​ l o g P (X_{i} ​ ∣ λ) \propto X_{i} l o g λ - λ

\sum_{i = 1}^{n} ​ l o g P (X_{i} ​ ∣ λ) + l o g P (λ)

M A P (λ) \propto \sum_{i = 1}^{n} X_{i} l o g λ - n λ + (α - 1) l o g λ - β λ

= l o g λ (\sum_{i = 1}^{n} X_{i} + α - 1) - λ (n + β)

\frac{d}{d λ} l o g P (λ | X) = \frac{\sum_{i = 1}^{n} X_{i} + α - 1}{λ} - (n + β) = 0

λ_{M A P} = \frac{\sum_{i = 1}^{n} X_{i} + α - 1}{n + β}

Source of Error: Part 1

(a)

The bias of an estimator is defined as $E [\hat{μ}] - μ$

The bias is $1 - μ$

The variance of an estimator is defined as $V a r (\hat{μ}) = E [(\hat{μ} - E [{\hat{μ}}^{2}])]$

$∴ V a r (\hat{μ}) = 0$

This is not a good estimator, since the bias is large when the true value of $μ$ is not 1. Usually we don’t have any information about the true value of $μ$ , so it is unreasonable to assume it is equal to 1.

(b)

$E (\hat{μ}) = μ$ the bias is 0. This is an unbiased estimator. The variance of this estimator is $V a r (\hat{μ}) = V a r (y_{1}) = 1$

This is not a good estimator since its variability does not decrease with the sample size.

(c)

- 2 \sum_{i = 1}^{n} (y_{i} - μ) + 2 λ μ = 0

\hat{μ} = \frac{1}{n + λ} \sum_{i} y_{i} = \frac{n}{n + λ} \bar{y}

E [\hat{μ}] = \frac{1}{n + λ} E [\sum_{i} y_{i}] = \frac{n}{n + λ} μ

Bias of the estimator :

bias = \frac{- λ μ}{n + λ}

Variance of the estimator :

Var (\hat{μ}) = Var (\frac{1}{n + λ} \sum_{i} y_{i}) = \frac{1}{(n + λ)^{2}} \sum_{i} Var (y_{i}) = \frac{n}{(n + λ)^{2}} σ^{2}

Source of Error: Part 2

(a)

(b)

The error is equal to 0.

Because $p (X | Y = 0)$ and $p (X | Y = 1)$ do not overlap.

Just check whether it is in the interval [-4,-1] or in the interval [1,4]

(c)

P [error] = P [x \in [0, 1]] \times P [error | x \in [0, 1]]

= (P [x \in [0, 1] | y = 0] P [y = 0] + P [x \in [0, 1] | y = 1] P [y = 1]) \times P [error | x \in [0, 1]]

= (\frac{1}{4} \times \frac{1}{2} + \frac{1}{4} \times \frac{1}{2}) \times \frac{1}{2} = \frac{1}{8}

(d)

$E [X | Y = 0] = - 2.5$ and $Var [X | Y = 0] = \frac{3}{4}$ (using the variance formula for the uniform distribution),
$E [X | Y = 1] = 2.5$ and $Var [X | Y = 1] = \frac{3}{4}$ .

Since we are approximating $p (X | Y)$ using a normal distribution, we have:

$\hat{p} (X | Y = 0) = N (- 2.5, 0.75)$ ,
$\hat{p} (X | Y = 1) = N (2.5, 0.75)$ .

Using these, for $x < 0$ , we find $\hat{p} (X | Y = 0) > \hat{p} (X | Y = 1)$ , and for $x > 0$ , $\hat{p} (X | Y = 0) < \hat{p} (X | Y = 1)$ . Therefore, the classifier will make no error in classifying new points.

(e)

Given a finite amount of data, we will not learn the mean and variance of $p (X | Y)$ perfectly. Therefore, the classifier's error will increase due to the limited data. In this scenario, we would have both bias and error in our model.

Gaussian (Naïve) Bayes and Logistic Regression

No, the new $P (Y | X)$ is no longer the form used by logistic regression.

P (Y = 1 | X) = \frac{P (Y = 1) P (X | Y = 1)}{P (Y = 1) P (X | Y = 1) + P (Y = 0) P (X | Y = 0)}

= \frac{1}{1 + \frac{P (Y = 0) P (X | Y = 0)}{P (Y = 1) P (X | Y = 1)}}

= \frac{1}{1 + \exp (\ln \frac{P (Y = 0) P (X | Y = 0)}{P (Y = 1) P (X | Y = 1)})}

= \frac{1}{1 + \exp (\ln \frac{1 - π}{π} + \ln \frac{P (X | Y = 0)}{P (X | Y = 1)})}

= \frac{1}{1 + \exp (\ln \frac{1 - π}{π} + \sum_{i} \ln \frac{P (X_{i} | Y = 0)}{P (X_{i} | Y = 1)})}

The log ratio of class-conditional probabilities:

\sum_{i} \ln \frac{P (X_{i} | Y = 0)}{P (X_{i} | Y = 1)} = \sum_{i} \ln \frac{\frac{1}{\sqrt{2 π} σ_{i 0}} \exp (- \frac{(X_{i} - μ_{i 0})^{2}}{2 σ_{i 0}^{2}})}{\frac{1}{\sqrt{2 π} σ_{i 1}} \exp (- \frac{(X_{i} - μ_{i 1})^{2}}{2 σ_{i 1}^{2}})}

Simplifies to:

= \sum_{i} \ln \frac{σ_{i 1}}{σ_{i 0}} + \sum_{i} (\frac{(X_{i} - μ_{i 1})^{2}}{2 σ_{i 1}^{2}} - \frac{(X_{i} - μ_{i 0})^{2}}{2 σ_{i 0}^{2}})

= \sum_{i} \ln \frac{σ_{i 1}}{σ_{i 0}} + \sum_{i} \frac{σ_{i 0}^{2} - σ_{i 1}^{2}}{2 σ_{i 0}^{2} σ_{i 1}^{2}} X_{i}^{2} + 2 (\frac{μ_{i 0} σ_{i 1}^{2} - μ_{i 1} σ_{i 0}^{2}}{σ_{i 0}^{2} σ_{i 1}^{2}}) X_{i} + \frac{μ_{i 1}^{2} σ_{i 0}^{2} - μ_{i 0}^{2} σ_{i 1}^{2}}{2 σ_{i 0}^{2} σ_{i 1}^{2}}

Probability of $P (Y = 1 | X)$ :

P (Y = 1 | X) = \frac{1}{1 + \exp (\ln \frac{1 - π}{π} + \sum_{i} \ln \frac{P (X_{i} | Y = 0)}{P (X_{i} | Y = 1)})}

Simplifies to:

P (Y = 1 | X) = \frac{1}{1 + \exp (w_{0} + \sum_{i} w_{i} X_{i} + \sum_{i} v_{i} X_{i}^{2})}

w_{0} = \ln \frac{1 - π}{π} + \sum_{i} (\ln \frac{σ_{i 1}}{σ_{i 0}} + \frac{μ_{i 1}^{2} σ_{i 0}^{2} - μ_{i 0}^{2} σ_{i 1}^{2}}{2 σ_{i 0}^{2} σ_{i 1}^{2}})

w_{i} = \frac{μ_{i 0} σ_{i 1}^{2} - μ_{i 1} σ_{i 0}^{2}}{σ_{i 0}^{2} σ_{i 1}^{2}}

v_{i} = \frac{σ_{i 0}^{2} - σ_{i 1}^{2}}{2 σ_{i 0}^{2} σ_{i 1}^{2}}

Algorithm

Tutorial

assignment

Assignment

As-1

As-2

Lab-1

Lab-2

Lab-3

Lab-4

GAMES101

Assignment-1

Assignment-2

Assignment-3

Assignment-4

Lab

Lecture

Peoject

CSCN

Ploidy

ML-As-2 ​

Point Estimation ​

(a) ​

(b) ​

(c) ​

Source of Error: Part 1 ​

(a) ​

(b) ​

(c) ​

Source of Error: Part 2 ​

(a) ​

(b) ​

(c) ​

(d) ​

(e) ​

Gaussian (Naïve) Bayes and Logistic Regression ​

ML-As-2

Point Estimation

(a)

(b)

(c)

Source of Error: Part 1

(a)

(b)

(c)

Source of Error: Part 2

(a)

(b)

(c)

(d)

(e)

Gaussian (Naïve) Bayes and Logistic Regression